Learning Differentiable Programs with Admissible Neural Heuristics (1/24)
We first discussed some possibilities for adding LLM’s to the picture
- The current architecture of the neural heuristic is FFN/RNN, is there benefit to making it a language model?
- Could more constructs (rather than just if then else) be learned? Could improvement be improved?
While doing so, we talked about common assumptions in LLMs:
- Perplexity is probably not a good metric for downstream performance, in both NLP and in code
- [shashank edit]
this was the work on perplexity that i was referring to: https://arxiv.org/abs/2210.12365 table 4, column 4 reports perplexity. row 1 shows the perplexity of a human expert created set of sentences. rows 4-8 show how the perplexity of a neural generated solution can be lesser than that of humans. a couple of things to note though:
- they don't really evaluate the results of the neural generated solution against experts to see whether their solutions are actually valid. Which is a big issue in their validation. so i would factor that in when interpreting these results.
- the perplexity of the human-set is not very different from the neural generated set. they are both relatively quite low. it then comes to setting a threshold for what "low" is.
- what alex was referring to was likely this: https://arxiv.org/abs/2203.07814 figure A5, page 48 shows that the temperature of the decoder (which controls the perplexity) has no effect on the solving rate of the model.
- Since LLM’s for code are trained on “clean” data, would it be a fair assumption that it doesn’t generate trivial bugs?
- How can we use LLM’s to prune in a way that this work does?
We also discussed why we need a neural admissible relaxation instead of using some purely symbolic heuristic-based relaxation:
- The neural relaxation is for learning a program that fits the input-output examples. However, maybe some symbolic-based heuristics could be considered for pruning branches.
- We discussed the challenges that would come from using a bigger DSL.
Finally, we talked about the difficulties of supporting loops in the DSL.
- Someone brought up a Martin Vechev paper that supports loops ← [edit: shashank] The vechev paper that i was referring to was quite different from what we were discussing. They essentially attempt to define NN primitives like tanh etc. in terms of abstract domains. It came to mind because they attempt to translate and describe these concepts of tanh etc. in another algebra: of abstraction domains. We’re interested here translating concepts on program primitives like loops etc. another algebra: linear algebra.
Paper i was thinking about: https://ggndpsngh.github.io/files/DeepPoly.pdf . See Section 4.
What i rather had in mind was TerpreT, which does define loops as differentiable objects.
- We mentioned TerpreT and Stitch (to support loops soon?)
- Other related work: https://arxiv.org/abs/1611.02109
- Eric Zhang’s excellent notes on Nada’s course on PL+AI. This has notes on TerpreT and a bunch more: https://www.ekzhang.com/assets/pdf/CS_252r_Notes.pdf